Generic Pattern Trees for Exhaustive Exceptional Model Mining
نویسندگان
چکیده
Exceptional model mining has been proposed as a variant of subgroup discovery especially focusing on complex target concepts. Currently, efficient mining algorithms are limited to heuristic (non exhaustive) methods. In this paper, we propose a novel approach for fast exhaustive exceptional model mining: We introduce the concept of valuation bases as an intermediate condensed data representation, and present the general GP-growth algorithm based on FP-growth. Furthermore, we discuss the scope of the proposed approach by drawing an analogy to data stream mining and provide examples for several different model classes. Runtime experiments show improvements of more than an order of magnitude in comparison to a naive exhaustive depth-first search.
منابع مشابه
DMTL : A Generic Data Mining Template Library
FPM(Frequent Pattern Mining) is a data mining paradigm to extract informative patterns from massive datasets. Researchers have developed numerous novel algorithms to extract these patterns. Unfortunately, the focus primarily has been on a small set of popular patterns (itemsets, sequences, trees and graphs) and no framework for integrating the FPM process has been attempted. In this paper we in...
متن کاملExceptional Model Mining with Tree-Constrained Gradient Ascent
Exceptional Model Mining (EMM) generalizes the wellknown data mining task of Subgroup Discovery (SD). Given a model class of interest, the goal of EMM is to find subgroups of the data for which a model fitted to the subgroup deviates substantially from the global model. In both SD and EMM, heuristic search is often employed to circumvent the problem that exhaustive search of the subgroup descri...
متن کاملAny-time Diverse Subgroup Discovery with Monte Carlo Tree Search
The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting hypotheses from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Exist...
متن کاملA Framework for Discovery and Diagnosis of Behavioral Transitions in Event-streams
Date stream mining techniques can be used in tracking user behaviors as they attempt to achieve their goals. Quality metrics over stream-mined models identify potential changes in user goal attainment. When the quality of some data mined models varies significantly from nearby models—as defined by quality metrics—then the user’s behavior is automatically flagged as a potentially significant beh...
متن کاملNon-redundant Subgroup Discovery in Large and Complex Data
Large and complex data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly red...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012